1 Import Libraries

List of used packages: knitr dplyr ggplot2 broom reshape2 janitor plm pwt9 quarto renv shiny targets testthat tidyverse tibble usethis rio lubridate purrr Hmisc plotly hrbrthemes xts seasonal tsbox forecast tseries plotly ggridges shades urca

colorize <- function(x, color) {
  if (knitr::is_latex_output()) {
    sprintf("\\textcolor{%s}{%s}", color, x)
  } else if (knitr::is_html_output()) {
    sprintf("<span style='color: %s;'>%s</span>", color,
      x)
  } else x
}
#webshot::install_phantomjs()

2 Introduction

2.1 Describe Dataset

The dataset used for this project is Daily Delhi Climate, which consists of the following columns: 1. date: Date of format YYYY-MM-DD starting from “2013-01-01” and ending in “2017-01-01”.
2. meantemp: Mean temperature averaged out from multiple 3 hour intervals in a day.
3. humidity: Humidity value for the day (units are grams of water vapor per cubic meter volume of air).
4. wind_speed: Wind speed measured in kmph.
5. mean_pressure: Pressure reading of weather (measure in atm)

Q1. How can I find out if analyzing meantemp indvidiually (univariate) suffices, or if including other columns and perform a multivariate analysis would worth the effort to find more meaninful patterns and forecasts?

2.2 Goal and Procedure

The goal of this project is to analyze and forecast the mean temperature Delhi, which is recorded in the meantemp column. For this, after importing the dataset, outliers are removed in Preprocessing Section section. Then meantemp column is assigned to a time series object in Construct Time Series section for further processing, analysis, and forecast. After detecting seasonalities using plots, the time series is seasonally adjusted using X13-ARIMA-SEATS. Then, remaining trend is removed in detrend.

Before forecasting the time series, I check for stationarity of time series, as stationarity is an assumption in ARIMA model. For this purpose, unit root tests are applied in Stationarity section.

Finally, I used ARIMA model to forecast the time series in Forecast Time Series section.

df_train <- read_csv("data/DailyDelhiClimateTrain.csv")
## Rows: 1462 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (4): meantemp, humidity, wind_speed, meanpressure
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_test <- read_csv("data/DailyDelhiClimateTest.csv")
## Rows: 114 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (4): meantemp, humidity, wind_speed, meanpressure
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(df_train)
##       date               meantemp        humidity        wind_speed    
##  Min.   :2013-01-01   Min.   : 6.00   Min.   : 13.43   Min.   : 0.000  
##  1st Qu.:2014-01-01   1st Qu.:18.86   1st Qu.: 50.38   1st Qu.: 3.475  
##  Median :2015-01-01   Median :27.71   Median : 62.62   Median : 6.222  
##  Mean   :2015-01-01   Mean   :25.50   Mean   : 60.77   Mean   : 6.802  
##  3rd Qu.:2016-01-01   3rd Qu.:31.31   3rd Qu.: 72.22   3rd Qu.: 9.238  
##  Max.   :2017-01-01   Max.   :38.71   Max.   :100.00   Max.   :42.220  
##   meanpressure     
##  Min.   :  -3.042  
##  1st Qu.:1001.580  
##  Median :1008.563  
##  Mean   :1011.105  
##  3rd Qu.:1014.945  
##  Max.   :7679.333
df_train |> describe()
## df_train 
## 
##  5  Variables      1462  Observations
## --------------------------------------------------------------------------------
## date 
##          n    missing   distinct       Info       Mean        Gmd        .05 
##       1462          0       1462          1 2015-01-01      487.7 2013-03-15 
##        .10        .25        .50        .75        .90        .95 
## 2013-05-27 2014-01-01 2015-01-01 2016-01-01 2016-08-07 2016-10-19 
## 
## lowest : 2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05
## highest: 2016-12-28 2016-12-29 2016-12-30 2016-12-31 2017-01-01
## --------------------------------------------------------------------------------
## meantemp 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1462        0      617        1     25.5    8.331    12.51    14.62 
##      .25      .50      .75      .90      .95 
##    18.86    27.71    31.31    33.71    35.43 
## 
## lowest :  6.000000  7.000000  7.166667  7.400000  8.666667
## highest: 38.200000 38.272727 38.428571 38.500000 38.714286
## --------------------------------------------------------------------------------
## humidity 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1462        0      897        1    60.77    18.96    29.00    36.21 
##      .25      .50      .75      .90      .95 
##    50.38    62.62    72.22    81.85    86.99 
## 
## lowest :  13.42857  15.85714  18.46667  19.00000  19.50000
## highest:  96.12500  96.62500  96.85714  98.00000 100.00000
## --------------------------------------------------------------------------------
## wind_speed 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1462        0      736        1    6.802    4.878    0.925    1.625 
##      .25      .50      .75      .90      .95 
##    3.475    6.222    9.238   12.608   14.813 
## 
## lowest :  0.0000000  0.4625000  0.4750000  0.5285714  0.6166667
## highest: 27.7750000 30.6857143 33.3250000 34.4875000 42.2200000
## --------------------------------------------------------------------------------
## meanpressure 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1462        0      626        1     1011    22.92    996.8    998.1 
##      .25      .50      .75      .90      .95 
##   1001.6   1008.6   1014.9   1017.8   1019.3 
## 
## lowest :   -3.041667   12.045455  310.437500  633.900000  938.066667
## highest: 1022.125000 1023.000000 1350.296296 1352.615385 7679.333333
##                                                     
## Value          0   300   600   900  1000  1400  7700
## Frequency      2     1     1     2  1453     2     1
## Proportion 0.001 0.001 0.001 0.001 0.994 0.001 0.001
## 
## For the frequency table, variable is rounded to the nearest 100
## --------------------------------------------------------------------------------

3 Visualize Data

Below we can see interactive plot of the time series.

p <- df_train |>
  ggplot( aes(x=date, y=meantemp)) +
    geom_area(fill="#69b3a2", alpha=0.5) +
    geom_line(color="#69b3a2") +
    ylab("bitcoin price ($)") +
    theme_ipsum()

# Turn it interactive with ggplotly
p <- ggplotly(p)
#p
p

4 Preprocessing

We can detect an outlier at the last observation (last row of dataframe). It causes an abrupt decrease in value of temperature. This would lead to problems in further analysis I will proceed and also when I later apply functions on the time series. Therefore, I replace the last observation’s value with its previous one.

Q2. Is imputing with the last observation the right approach here? Or is preferrable that I use median of last n (rows) for instance? Maybe it won’t matter, as when I aggregate data (per week, month, week, etc.), I remove the last observation.

previous_value <- df_train$meantemp[df_train$date == as.Date('2016-12-31')]

df_train$meantemp[df_train$date == as.Date('2017-01-01')]<- previous_value 
#df_train <- head(df_train, -1)
head(df_train)
## # A tibble: 6 × 5
##   date       meantemp humidity wind_speed meanpressure
##   <date>        <dbl>    <dbl>      <dbl>        <dbl>
## 1 2013-01-01    10        84.5       0           1016.
## 2 2013-01-02     7.4      92         2.98        1018.
## 3 2013-01-03     7.17     87         4.63        1019.
## 4 2013-01-04     8.67     71.3       1.23        1017.
## 5 2013-01-05     6        86.8       3.7         1016.
## 6 2013-01-06     7        82.8       1.48        1018
tail(df_train)
## # A tibble: 6 × 5
##   date       meantemp humidity wind_speed meanpressure
##   <date>        <dbl>    <dbl>      <dbl>        <dbl>
## 1 2016-12-27     16.8     67.6       8.34        1017.
## 2 2016-12-28     17.2     68.0       3.55        1016.
## 3 2016-12-29     15.2     87.9       6           1017.
## 4 2016-12-30     14.1     89.7       6.27        1018.
## 5 2016-12-31     15.1     87         7.32        1016.
## 6 2017-01-01     15.1    100         0           1016

Let us how the plot looks after removing the outlier:

p <- df_train |>
  ggplot( aes(x=date, y=meantemp)) +
    geom_area(fill="#69b3a2", alpha=0.5) +
    geom_line(color="#69b3a2") +
    ylab("bitcoin price ($)") +
    theme_ipsum()

# Turn it interactive with ggplotly
p <- ggplotly(p)
#p
p

Find if there is any missing dates

date_range <- seq(min(df_train$date), max(df_train$date), by = 1) 
date_range[!date_range %in% df_train$date] 
## Date of length 0

5 Prepare Test Set

summary(df_test)
##       date               meantemp        humidity       wind_speed    
##  Min.   :2017-01-01   Min.   :11.00   Min.   :17.75   Min.   : 1.387  
##  1st Qu.:2017-01-29   1st Qu.:16.44   1st Qu.:39.62   1st Qu.: 5.564  
##  Median :2017-02-26   Median :19.88   Median :57.75   Median : 8.069  
##  Mean   :2017-02-26   Mean   :21.71   Mean   :56.26   Mean   : 8.144  
##  3rd Qu.:2017-03-26   3rd Qu.:27.71   3rd Qu.:71.90   3rd Qu.:10.069  
##  Max.   :2017-04-24   Max.   :34.50   Max.   :95.83   Max.   :19.314  
##   meanpressure 
##  Min.   :  59  
##  1st Qu.:1007  
##  Median :1013  
##  Mean   :1004  
##  3rd Qu.:1017  
##  Max.   :1023
df_test |> describe()
## df_test 
## 
##  5  Variables      114  Observations
## --------------------------------------------------------------------------------
## date 
##          n    missing   distinct       Info       Mean        Gmd        .05 
##        114          0        114          1 2017-02-26      38.33 2017-01-06 
##        .10        .25        .50        .75        .90        .95 
## 2017-01-12 2017-01-29 2017-02-26 2017-03-26 2017-04-12 2017-04-18 
## 
## lowest : 2017-01-01 2017-01-02 2017-01-03 2017-01-04 2017-01-05
## highest: 2017-04-20 2017-04-21 2017-04-22 2017-04-23 2017-04-24
## --------------------------------------------------------------------------------
## meantemp 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      114        0      105        1    21.71    7.226    13.22    14.75 
##      .25      .50      .75      .90      .95 
##    16.44    19.88    27.71    31.00    32.67 
## 
## lowest : 11.00000 11.72222 11.78947 12.11111 13.04167
## highest: 32.90000 33.50000 34.00000 34.25000 34.50000
## --------------------------------------------------------------------------------
## humidity 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      114        0      109        1    56.26    21.97    26.74    29.49 
##      .25      .50      .75      .90      .95 
##    39.62    57.75    71.90    78.42    82.20 
## 
## lowest : 17.75000 19.42857 21.12500 24.12500 26.00000
## highest: 83.52632 84.44444 85.86957 91.64286 95.83333
## --------------------------------------------------------------------------------
## wind_speed 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      114        0      109        1    8.144    4.031    2.842    3.715 
##      .25      .50      .75      .90      .95 
##    5.564    8.069   10.069   13.464   14.353 
## 
## lowest :  1.387500  1.625000  1.854545  1.950000  2.100000
## highest: 14.384615 15.512500 16.662500 17.590000 19.314286
## --------------------------------------------------------------------------------
## meanpressure 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##      114        0      109        1     1004    23.16     1002     1004 
##      .25      .50      .75      .90      .95 
##     1007     1013     1017     1019     1021 
## 
## lowest :   59.000  998.625  999.875 1000.875 1001.600
## highest: 1021.375 1021.556 1021.789 1021.958 1022.810
##                                   
## Value         60  1000  1010  1020
## Frequency      1    14    53    46
## Proportion 0.009 0.123 0.465 0.404
## 
## For the frequency table, variable is rounded to the nearest 10
## --------------------------------------------------------------------------------
xts_test_meantemp <- xts(df_test$meantemp, order.by=df_test$date, "%Y-%m-%d")
head(xts_test_meantemp)
##                [,1]
## 2017-01-01 15.91304
## 2017-01-02 18.50000
## 2017-01-03 17.11111
## 2017-01-04 18.70000
## 2017-01-05 18.38889
## 2017-01-06 19.31818
tail(xts_test_meantemp)
##              [,1]
## 2017-04-19 33.500
## 2017-04-20 34.500
## 2017-04-21 34.250
## 2017-04-22 32.900
## 2017-04-23 32.875
## 2017-04-24 32.000
ts_plot(xts_test_meantemp)

ts_test_meantemp <- ts_ts(xts_test_meantemp)
xts_week_test_meantemp <- apply.weekly(xts_test_meantemp,sum)
ts_week_test_meantemp <- na.remove(ts_ts(xts_week_test_meantemp))
#ts_week_test_meantemp <- as.ts(xts_week_test_meantemp)
length(ts_week_test_meantemp)
## [1] 18
ts_plot(xts_week_test_meantemp)

6 Time Series Analysis

6.1 Construct Time Series

Now we use the meantemp column to create our time series data. I assigned the time series to xts objects. But since many functions later require ts object, each time I define an xts, I also convert it to ts object using tsbox::ts_ts

min(df_train$date)
## [1] "2013-01-01"
max(df_train$date)
## [1] "2017-01-01"
#ts_train <- zoo(df_train$meantemp, df_train$date)

xts_train_meantemp <- xts(df_train$meantemp, order.by=df_train$date, "%Y-%m-%d")
class(xts_train_meantemp)
## [1] "xts" "zoo"
head(xts_train_meantemp)
##                 [,1]
## 2013-01-01 10.000000
## 2013-01-02  7.400000
## 2013-01-03  7.166667
## 2013-01-04  8.666667
## 2013-01-05  6.000000
## 2013-01-06  7.000000
tail(xts_train_meantemp)
##                [,1]
## 2016-12-27 16.85000
## 2016-12-28 17.21739
## 2016-12-29 15.23810
## 2016-12-30 14.09524
## 2016-12-31 15.05263
## 2017-01-01 15.05263
# convert xts to ts

## Create a daily Date object for ts
#inds <- seq(as.Date("2013-01-01"), as.Date("2017-01-01"), by = "day")

#set.seed(25)
#ts_train <- ts(df_train$meantemp,     # random data
#           start = c(2013, as.numeric(format(inds[1], "%j"))),
#           frequency = 365)


#ts_train <- ts(df_train$meantemp, start = decimal_date(ymd("2013-01-01")), frequency = 365.25 / 7)

Convert XTS objects to TS objects:

ts_train_meantemp <-ts_ts(xts_train_meantemp)
head(ts_train_meantemp)
## Time Series:
## Start = 2013 
## End = 2013.01368953503 
## Frequency = 365.2425 
## [1] 10.000000  7.400000  7.166667  8.666667  6.000000  7.000000
tail(ts_train_meantemp)
## Time Series:
## Start = 2016.98639260218 
## End = 2017.00008213721 
## Frequency = 365.2425 
## [1] 16.85000 17.21739 15.23810 14.09524 15.05263 15.05263

I plot a static plot as well:

ts_plot(xts_train_meantemp)

6.2 Seasonality

From the initial plot I judge that there is seasonali. For more delicate observation to find if there is more granular periods of seasonality, I use seasonality plots. Before that, I aggregate data weekly, monthly, and quarterly.

6.2.1 Seasonality Plots

# Weekly mean temperature
xts_week_train_meantemp <- apply.weekly(xts_train_meantemp,sum)
ts_week_train_meantemp <-ts_ts(xts_week_train_meantemp)

# Monthly mean temperature
xts_mon_train_meantemp <- aggregate(xts_train_meantemp, by=as.yearmon, FUN=sum)
ts_mon_train_meantemp <-ts_ts(xts_mon_train_meantemp)

# Quarterly mean temperature
xts_quar_train_meantemp <- aggregate(xts_train_meantemp, as.yearqtr, FUN=sum)
ts_quar_train_meantemp <-ts_ts(xts_quar_train_meantemp)


# Yearly mean temperate
as.year <- function(x) as.integer(as.yearmon(x))
xts_year_train_meantemp <- aggregate(xts_train_meantemp, by=as.year, FUN=sum)
#ts_year_train_meantemp <-ts_ts(xts_year_train_meantemp)
#xts_year_train_meantemp[1]

The year 2017 has only one observation, so I remove it from all the aggregated datasets. I couldn’t do it before aggregating, otherwise I would have confronted the error Error: series has no regular pattern.

xts_week_train_meantemp <- head(xts_week_train_meantemp, -1)
xts_mon_train_meantemp <- head(xts_mon_train_meantemp, -1)
xts_quar_train_meantemp <- head(xts_quar_train_meantemp, -1)

ts_week_train_meantemp <- head(ts_week_train_meantemp, -1)
ts_mon_train_meantemp <- head(ts_mon_train_meantemp, -1)
ts_quar_train_meantemp <- head(ts_quar_train_meantemp, -1)
#options(repr.plot.width = 7, repr.plot.height =20)
forecast::ggseasonplot(ts_mon_train_meantemp, year.labels=TRUE, year.labels.left=TRUE, labelgap = 0.1) +
  ylab("degree") +
  ggtitle("Seasonal plot: Monthly Mean Temperature")

forecast::ggseasonplot(ts_mon_train_meantemp, year.labels=TRUE, year.labels.left=TRUE, labelgap = 0.1, polar=TRUE) +
  ylab("degree") +
  ggtitle("Polar Seasonal plot: Monthly Mean Temperature")

#options(repr.plot.width = 7, repr.plot.height =20)
forecast::ggseasonplot(ts_quar_train_meantemp, year.labels=TRUE, year.labels.left=TRUE, labelgap = 0.1) +
  ylab("degree") +
  ggtitle("Seasonal plot: Quarterly Mean Temperature")

forecast::ggseasonplot(ts_quar_train_meantemp, year.labels=TRUE, year.labels.left=TRUE, labelgap = 0.1, polar=TRUE) +
  ylab("degree") +
  ggtitle("Polar Seasonal plot: Quarterly Mean Temperature")

6.2.2 Deseasonalize

If I need to remove different periods of seasonality together, I would need to use the forecast:msts function. For instance in below I remove weekly and yearly seasonality together.

des_ts_train_meantemp <- msts(xts_train_meantemp,seasonal.periods = c(7,365))
#head(des_xts_train)
#library(tsbox)
#ts_train <-ts_ts(xts_train)
#ts_train

class(des_ts_train_meantemp)
## [1] "msts" "ts"

However, since its output had an unfamiliar and weird shape to me, and also since I wasn’t sure it uses the state-of-the-art X13 decomposition, I incorporated the X-13ARIMA-SEATS using seasonal:seas function. However, it has some limitations, as stated in the package’s reference manua. For instance, the number of observations must not exceed 780. Nor should maximum seasonal period exceed 12. That is why I couldn’t use original data ts_train and also the weekly aggregated data ts_week_train, as I would confront the error Seasonal period too large. The only possible aggregated data with highest frequency possible was monthly aggregated, ts_mon_train. However, I am concerned that I would lose significant pattern and information with this amount of aggregation.

Q3. If you could kindly share your viewpoint here, it would be very helpful for me to ensure how to proceed.

length(xts_train_meantemp)
## [1] 1462
length(ts_train_meantemp)
## [1] 1462
length(xts_train_meantemp)
## [1] 1462
nowXTS <-ts_xts(ts_train_meantemp)
length(nowXTS)
## [1] 1462
length(ts_week_train_meantemp)
## [1] 208
plot(ts_week_train_meantemp)

length(ts_week_train_meantemp)
## [1] 208
plot(ts_train_meantemp)

length(ts_train_meantemp)
## [1] 1462
plot(ts_mon_train_meantemp)

length(ts_mon_train_meantemp)
## [1] 48
m <- seas(ts_mon_train_meantemp)
ts_train_adj_meantemp <- final(m)
#ts_train_adj
length(ts_train_adj_meantemp)
## [1] 48
m <- seas(ts_mon_train_meantemp)
ts_train_adj_meantemp <- final(m)
#ts_train_adj
length(ts_train_adj_meantemp)
## [1] 48
plot(ts_train_adj_meantemp)

Plot original data along with trend and seasonally adjusted data

#ts_train
#series(m, "forecast.forecasts")
#out(m)
#seasadj(m)
autoplot(ts_mon_train_meantemp, series="Original Data") +
autolayer(trendcycle(m), series="Trend") +
autolayer(seasadj(m), series="Seasonally Adjusted") +
xlab("Year") + ylab("Mean Temperature") +
ggtitle("Mean Temperature Decomposed using X13") +
scale_colour_manual(values=c("gray","blue","red"),
           breaks=c("Original Data","Seasonally Adjusted","Trend"))

#ap < ggplotly(ap)

6.3 Detrend

In the seasonally adjusted time series ts_train_adj, I detected a trend, therefore I detrend it using differencing.

#ts_train_adj_meantemp |> log() |> nsdiffs(alpha=0.01) -> ts_train_adj_det_meantemp
ts_train_adj_meantemp |> log() |> diff() -> ts_train_adj_det_meantemp
plot(ts_train_adj_det_meantemp)

#plot(d)

6.4 Correlation Plots

  1. Weekly aggregated of original time series
ggAcf(ts_week_train_meantemp, lag=50)

pacf (ts_week_train_meantemp, lag=50, pl = TRUE)

  1. Seasonally Adjusted
ggAcf(ts_train_adj_meantemp, lag=10)

pacf (ts_train_adj_meantemp, lag=10, pl = TRUE)

  1. Seasonally Adjusted and Detrended
ggAcf(ts_train_adj_det_meantemp, lag=10)

pacf (ts_train_adj_det_meantemp, lag=10, pl = TRUE)

6.5 Testing Stationarity: Unit Root Tests

6.5.1 ADF

  1. Original Time Series and its weekly adjusted
ts_train_meantemp |> adf.test()
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts_train_meantemp
## Dickey-Fuller = -1.9871, Lag order = 11, p-value = 0.5838
## alternative hypothesis: stationary
ts_week_train_meantemp |> adf.test()
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts_week_train_meantemp
## Dickey-Fuller = -3.8729, Lag order = 5, p-value = 0.01651
## alternative hypothesis: stationary
  1. Seasonally Adjusted
ts_train_adj_meantemp |> adf.test() 
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts_train_adj_meantemp
## Dickey-Fuller = -2.5897, Lag order = 3, p-value = 0.3386
## alternative hypothesis: stationary
  1. Seasonally Adjusted and Detrended
ts_train_adj_det_meantemp |> adf.test() 
## Warning in adf.test(ts_train_adj_det_meantemp): p-value smaller than printed p-
## value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts_train_adj_det_meantemp
## Dickey-Fuller = -4.698, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

6.5.2 KPSS

  1. Original Time Series and also its weekly aggregated
ts_train_meantemp |> ur.kpss() |> summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 7 lags. 
## 
## Value of test-statistic is: 0.5812 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
ts_week_train_meantemp |> ur.kpss() |> summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.1618 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
  1. Seasonally Adjusted
ts_train_adj_meantemp |> ur.kpss() |> summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.9974 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
  1. Seasonally Adjusted and Detrended
ts_train_adj_det_meantemp |> ur.kpss() |> summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.0595 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

6.5.3 Dickey–Fuller

  1. Original Time Series and also its weekly aggregated
ts_train_meantemp |> ur.df() |> summary()
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.6532 -0.8403  0.1233  1.0729  6.5542 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.001560   0.001622  -0.962    0.336    
## z.diff.lag -0.158894   0.025836  -6.150 9.98e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.644 on 1458 degrees of freedom
## Multiple R-squared:  0.02616,    Adjusted R-squared:  0.02483 
## F-statistic: 19.59 on 2 and 1458 DF,  p-value: 4.036e-09
## 
## 
## Value of test-statistic is: -0.9621 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62
ts_week_train_meantemp |> ur.df() |> summary()
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -35.318 -10.542   1.810   9.478  33.612 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## z.lag.1    -0.002234   0.005344  -0.418    0.676
## z.diff.lag -0.050285   0.068684  -0.732    0.465
## 
## Residual standard error: 14.27 on 204 degrees of freedom
## Multiple R-squared:  0.003633,   Adjusted R-squared:  -0.006135 
## F-statistic: 0.3719 on 2 and 204 DF,  p-value: 0.6899
## 
## 
## Value of test-statistic is: -0.418 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62
  1. Seasonally Adjusted
ts_train_adj_meantemp |> ur.df() |> summary()
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -51.384 -13.881  -1.239  16.337  55.819 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)   
## z.lag.1     0.00375    0.00445   0.843   0.4040   
## z.diff.lag -0.46189    0.13223  -3.493   0.0011 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.31 on 44 degrees of freedom
## Multiple R-squared:  0.2201, Adjusted R-squared:  0.1846 
## F-statistic: 6.208 on 2 and 44 DF,  p-value: 0.004217
## 
## 
## Value of test-statistic is: 0.8427 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.62 -1.95 -1.61
  1. Seasonally Adjusted and Detrended
ts_train_adj_det_meantemp |> ur.df() |> summary()
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.062429 -0.012003  0.005823  0.020854  0.073373 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1     -1.8387     0.2483  -7.407 3.33e-09 ***
## z.diff.lag   0.2468     0.1441   1.713   0.0939 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0295 on 43 degrees of freedom
## Multiple R-squared:  0.7547, Adjusted R-squared:  0.7433 
## F-statistic: 66.16 on 2 and 43 DF,  p-value: 7.539e-14
## 
## 
## Value of test-statistic is: -7.4065 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.62 -1.95 -1.61

7 Time Series Forecasting

7.1 SARIMA

  1. Forecast original time series of meantemp, as the original data has very high frequency, which makes it unsuitable for ARMA. For this case, I set seasonal=TRUE, as in cases 3m I use data that I seasonally adjusted them already. Setting seasonal=TRUE makes the model more time-consuming.
#forecast_ts_train_meantemp = auto.arima(ts_train_meantemp,
#                            trace = TRUE, 
#                            seasonal=TRUE,
#                            stepwise=FALSE,
#                            approximation=FALSE)
#checkresiduals(forecast_ts_train_meantemp)
forecast_ts_train_meantemp <- auto.arima(ts_train_meantemp,
                            d = 1,
                            D = 1,
                            start.p = 2,
                            start.q = 3,
                            max.p = 2,
                            #max.d = 1,
                            max.q = 3,
                            start.P = 0,
                            start.Q = 0,
                            max.P = 0,
                            #max.D = 1,
                            max.Q = 0,
                            trace = TRUE, 
                            seasonal=TRUE,
                            stepwise=TRUE,
                            approximation=FALSE
                            #nmodels=
                            )
## 
##  ARIMA(2,1,3)(0,1,0)[365]                    : 4789.495
##  ARIMA(0,1,0)(0,1,0)[365]                    : 4943.724
##  ARIMA(1,1,0)(0,1,0)[365]                    : 4926.887
##  ARIMA(0,1,1)(0,1,0)[365]                    : 4920.711
##  ARIMA(1,1,3)(0,1,0)[365]                    : 4789.51
##  ARIMA(2,1,2)(0,1,0)[365]                    : Inf
##  ARIMA(1,1,2)(0,1,0)[365]                    : 4790.218
## 
##  Best model: ARIMA(2,1,3)(0,1,0)[365]
checkresiduals(forecast_ts_train_meantemp)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,3)(0,1,0)[365]
## Q* = 363.76, df = 287, p-value = 0.001427
## 
## Model df: 5.   Total lags used: 292
forecast_ts_train_meantemp
## Series: ts_train_meantemp 
## ARIMA(2,1,3)(0,1,0)[365] 
## 
## Coefficients:
##          ar1     ar2      ma1      ma2      ma3
##       0.0104  0.4362  -0.2760  -0.6134  -0.0888
## s.e.  0.3402  0.2577   0.3395   0.3585   0.0480
## 
## sigma^2 = 4.561:  log likelihood = -2388.71
## AIC=4789.42   AICc=4789.5   BIC=4819.41
  1. Forecast original time series of meantemp but aggregated weekly, as the original data has very high frequency, which makes it unsuitable for ARMA. For this case, I set seasonal=TRUE, as in case 3, I use data that I seasonally adjusted them already. Setting seasonal=TRUE makes the model more time-consuming.
forecast_ts_week_train_meantemp = auto.arima(ts_week_train_meantemp,
                            trace = TRUE, 
                            seasonal=TRUE,
                            stepwise=FALSE,
                            approximation=FALSE)
## Warning: The time series frequency has been rounded to support seasonal
## differencing.
## 
##  ARIMA(0,1,0)(0,1,0)[52]                    : 1348.451
##  ARIMA(0,1,0)(0,1,1)[52]                    : Inf
##  ARIMA(0,1,0)(1,1,0)[52]                    : 1317.269
##  ARIMA(0,1,0)(1,1,1)[52]                    : Inf
##  ARIMA(0,1,1)(0,1,0)[52]                    : 1304.937
##  ARIMA(0,1,1)(0,1,1)[52]                    : Inf
##  ARIMA(0,1,1)(1,1,0)[52]                    : 1276.491
##  ARIMA(0,1,1)(1,1,1)[52]                    : Inf
##  ARIMA(0,1,2)(0,1,0)[52]                    : 1298.502
##  ARIMA(0,1,2)(0,1,1)[52]                    : Inf
##  ARIMA(0,1,2)(1,1,0)[52]                    : 1271.798
##  ARIMA(0,1,2)(1,1,1)[52]                    : Inf
##  ARIMA(0,1,3)(0,1,0)[52]                    : 1300.128
##  ARIMA(0,1,3)(0,1,1)[52]                    : Inf
##  ARIMA(0,1,3)(1,1,0)[52]                    : 1273.906
##  ARIMA(0,1,3)(1,1,1)[52]                    : Inf
##  ARIMA(0,1,4)(0,1,0)[52]                    : 1302.264
##  ARIMA(0,1,4)(0,1,1)[52]                    : Inf
##  ARIMA(0,1,4)(1,1,0)[52]                    : 1275.931
##  ARIMA(0,1,5)(0,1,0)[52]                    : 1301.246
##  ARIMA(1,1,0)(0,1,0)[52]                    : 1333.282
##  ARIMA(1,1,0)(0,1,1)[52]                    : Inf
##  ARIMA(1,1,0)(1,1,0)[52]                    : 1303.962
##  ARIMA(1,1,0)(1,1,1)[52]                    : Inf
##  ARIMA(1,1,1)(0,1,0)[52]                    : 1298.238
##  ARIMA(1,1,1)(0,1,1)[52]                    : Inf
##  ARIMA(1,1,1)(1,1,0)[52.17869]                    : Inf
##  ARIMA(1,1,1)(1,1,1)[52.17869]                    : Inf
##  ARIMA(1,1,2)(0,1,0)[52]                    : 1300.221
##  ARIMA(1,1,2)(0,1,1)[52]                    : Inf
##  ARIMA(1,1,2)(1,1,0)[52.17869]                    : Inf
##  ARIMA(1,1,2)(1,1,1)[52.17869]                    : Inf
##  ARIMA(1,1,3)(0,1,0)[52]                    : 1302.264
##  ARIMA(1,1,3)(0,1,1)[52]                    : Inf
##  ARIMA(1,1,3)(1,1,0)[52.17869]                    : Inf
##  ARIMA(1,1,4)(0,1,0)[52]                    : 1304.398
##  ARIMA(2,1,0)(0,1,0)[52]                    : 1325.56
##  ARIMA(2,1,0)(0,1,1)[52]                    : Inf
##  ARIMA(2,1,0)(1,1,0)[52]                    : 1295.714
##  ARIMA(2,1,0)(1,1,1)[52]                    : Inf
##  ARIMA(2,1,1)(0,1,0)[52]                    : 1300.196
##  ARIMA(2,1,1)(0,1,1)[52]                    : Inf
##  ARIMA(2,1,1)(1,1,0)[52]                    : Inf
##  ARIMA(2,1,1)(1,1,1)[52]                    : Inf
##  ARIMA(2,1,2)(0,1,0)[52]                    : 1302.316
##  ARIMA(2,1,2)(0,1,1)[52]                    : Inf
##  ARIMA(2,1,2)(1,1,0)[52]                    : 1274.862
##  ARIMA(2,1,3)(0,1,0)[52]                    : 1299.815
##  ARIMA(3,1,0)(0,1,0)[52]                    : 1317.794
##  ARIMA(3,1,0)(0,1,1)[52]                    : Inf
##  ARIMA(3,1,0)(1,1,0)[52]                    : 1286.975
##  ARIMA(3,1,0)(1,1,1)[52]                    : Inf
##  ARIMA(3,1,1)(0,1,0)[52]                    : 1302.228
##  ARIMA(3,1,1)(0,1,1)[52]                    : Inf
##  ARIMA(3,1,1)(1,1,0)[52]                    : 1274.048
##  ARIMA(3,1,2)(0,1,0)[52]                    : 1303.9
##  ARIMA(4,1,0)(0,1,0)[52]                    : 1299.762
##  ARIMA(4,1,0)(0,1,1)[52]                    : Inf
##  ARIMA(4,1,0)(1,1,0)[52]                    : 1269.65
##  ARIMA(4,1,1)(0,1,0)[52]                    : 1301.905
##  ARIMA(5,1,0)(0,1,0)[52]                    : 1301.906
## 
## 
## 
##  Best model: ARIMA(4,1,0)(1,1,0)[52]
checkresiduals(forecast_ts_week_train_meantemp)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(4,1,0)(1,1,0)[52]
## Q* = 33.201, df = 37, p-value = 0.6478
## 
## Model df: 5.   Total lags used: 42
forecast_ts_week_train_meantemp
## Series: ts_week_train_meantemp 
## ARIMA(4,1,0)(1,1,0)[52] 
## 
## Coefficients:
##           ar1      ar2      ar3      ar4     sar1
##       -0.5755  -0.5066  -0.4439  -0.3623  -0.5049
## s.e.   0.0786   0.0849   0.0844   0.0792   0.0746
## 
## sigma^2 = 181.4:  log likelihood = -628.54
## AIC=1269.08   AICc=1269.65   BIC=1287.34
  1. Forecast deseasonalized time series
forecast_ts_train_adj_meantemp = auto.arima(ts_train_adj_meantemp,
                            trace = TRUE, 
                            seasonal= FALSE,
                            stepwise=FALSE,
                            approximation=FALSE)
## 
##  ARIMA(0,1,0)                               : 441.3883
##  ARIMA(0,1,0)            with drift         : 443.1154
##  ARIMA(0,1,1)                               : 429.9577
##  ARIMA(0,1,1)            with drift         : 429.5371
##  ARIMA(0,1,2)                               : 432.2379
##  ARIMA(0,1,2)            with drift         : 431.8095
##  ARIMA(0,1,3)                               : 434.4112
##  ARIMA(0,1,3)            with drift         : 433.7996
##  ARIMA(0,1,4)                               : 434.6353
##  ARIMA(0,1,4)            with drift         : Inf
##  ARIMA(0,1,5)                               : Inf
##  ARIMA(0,1,5)            with drift         : Inf
##  ARIMA(1,1,0)                               : 432.8756
##  ARIMA(1,1,0)            with drift         : 434.1589
##  ARIMA(1,1,1)                               : 432.2366
##  ARIMA(1,1,1)            with drift         : 431.7635
##  ARIMA(1,1,2)                               : 434.6252
##  ARIMA(1,1,2)            with drift         : Inf
##  ARIMA(1,1,3)                               : 436.6632
##  ARIMA(1,1,3)            with drift         : Inf
##  ARIMA(1,1,4)                               : 436.8366
##  ARIMA(1,1,4)            with drift         : Inf
##  ARIMA(2,1,0)                               : 432.5313
##  ARIMA(2,1,0)            with drift         : 433.449
##  ARIMA(2,1,1)                               : 434.5063
##  ARIMA(2,1,1)            with drift         : 433.7124
##  ARIMA(2,1,2)                               : 436.9823
##  ARIMA(2,1,2)            with drift         : Inf
##  ARIMA(2,1,3)                               : Inf
##  ARIMA(2,1,3)            with drift         : Inf
##  ARIMA(3,1,0)                               : 434.8565
##  ARIMA(3,1,0)            with drift         : 435.9464
##  ARIMA(3,1,1)                               : 436.788
##  ARIMA(3,1,1)            with drift         : Inf
##  ARIMA(3,1,2)                               : Inf
##  ARIMA(3,1,2)            with drift         : Inf
##  ARIMA(4,1,0)                               : 436.5007
##  ARIMA(4,1,0)            with drift         : 437.448
##  ARIMA(4,1,1)                               : Inf
##  ARIMA(4,1,1)            with drift         : 436.8826
##  ARIMA(5,1,0)                               : 436.434
##  ARIMA(5,1,0)            with drift         : 436.606
## 
## 
## 
##  Best model: ARIMA(0,1,1)            with drift
checkresiduals(forecast_ts_train_adj_meantemp)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1) with drift
## Q* = 7.4191, df = 9, p-value = 0.5936
## 
## Model df: 1.   Total lags used: 10
forecast_ts_train_adj_meantemp
## Series: ts_train_adj_meantemp 
## ARIMA(0,1,1) with drift 
## 
## Coefficients:
##           ma1   drift
##       -0.6827  1.9880
## s.e.   0.1279  1.0526
## 
## sigma^2 = 488.7:  log likelihood = -211.49
## AIC=428.98   AICc=429.54   BIC=434.53
  1. Forecast deseasonalized and detrended time series
forecast_ts_train_adj_det_meantemp = auto.arima(ts_train_adj_det_meantemp,
                            trace = TRUE, 
                            seasonal= FALSE,
                            stepwise=FALSE,
                            approximation=FALSE)
## 
##  ARIMA(0,0,0)            with zero mean     : -183.1965
##  ARIMA(0,0,0)            with non-zero mean : -181.4467
##  ARIMA(0,0,1)            with zero mean     : -195.2457
##  ARIMA(0,0,1)            with non-zero mean : -195.724
##  ARIMA(0,0,2)            with zero mean     : -192.9693
##  ARIMA(0,0,2)            with non-zero mean : -193.439
##  ARIMA(0,0,3)            with zero mean     : -190.7534
##  ARIMA(0,0,3)            with non-zero mean : -191.4021
##  ARIMA(0,0,4)            with zero mean     : -190.4085
##  ARIMA(0,0,4)            with non-zero mean : Inf
##  ARIMA(0,0,5)            with zero mean     : Inf
##  ARIMA(0,0,5)            with non-zero mean : Inf
##  ARIMA(1,0,0)            with zero mean     : -191.9675
##  ARIMA(1,0,0)            with non-zero mean : -190.6383
##  ARIMA(1,0,1)            with zero mean     : -192.971
##  ARIMA(1,0,1)            with non-zero mean : -193.4768
##  ARIMA(1,0,2)            with zero mean     : -190.5819
##  ARIMA(1,0,2)            with non-zero mean : Inf
##  ARIMA(1,0,3)            with zero mean     : -188.4672
##  ARIMA(1,0,3)            with non-zero mean : Inf
##  ARIMA(1,0,4)            with zero mean     : -188.2137
##  ARIMA(1,0,4)            with non-zero mean : Inf
##  ARIMA(2,0,0)            with zero mean     : -192.6467
##  ARIMA(2,0,0)            with non-zero mean : -191.6913
##  ARIMA(2,0,1)            with zero mean     : -190.668
##  ARIMA(2,0,1)            with non-zero mean : -191.4938
##  ARIMA(2,0,2)            with zero mean     : -188.1949
##  ARIMA(2,0,2)            with non-zero mean : Inf
##  ARIMA(2,0,3)            with zero mean     : Inf
##  ARIMA(2,0,3)            with non-zero mean : Inf
##  ARIMA(3,0,0)            with zero mean     : -190.3099
##  ARIMA(3,0,0)            with non-zero mean : -189.1897
##  ARIMA(3,0,1)            with zero mean     : -188.4346
##  ARIMA(3,0,1)            with non-zero mean : Inf
##  ARIMA(3,0,2)            with zero mean     : Inf
##  ARIMA(3,0,2)            with non-zero mean : Inf
##  ARIMA(4,0,0)            with zero mean     : -188.6455
##  ARIMA(4,0,0)            with non-zero mean : -187.6592
##  ARIMA(4,0,1)            with zero mean     : Inf
##  ARIMA(4,0,1)            with non-zero mean : -188.2978
##  ARIMA(5,0,0)            with zero mean     : -188.6625
##  ARIMA(5,0,0)            with non-zero mean : -188.4274
## 
## 
## 
##  Best model: ARIMA(0,0,1)            with non-zero mean
checkresiduals(forecast_ts_train_adj_det_meantemp)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1) with non-zero mean
## Q* = 7.1212, df = 8, p-value = 0.5236
## 
## Model df: 1.   Total lags used: 9
#checkresiduals(forecast_ts_train_meantemp)
forecast_ts_train_adj_det_meantemp
## Series: ts_train_adj_det_meantemp 
## ARIMA(0,0,1) with non-zero mean 
## 
## Coefficients:
##           ma1    mean
##       -0.6984  0.0025
## s.e.   0.1247  0.0013
## 
## sigma^2 = 0.0008149:  log likelihood = 101.14
## AIC=-196.28   AICc=-195.72   BIC=-190.73

7.1.1 Evaluate

Based on the results from the forecast of original data (case 1), we have:

AIC_ARMA <- AIC(forecast_ts_train_meantemp)
AIC_ARMA
## [1] 4789.418
BIC_ARMA <- BIC(forecast_ts_train_meantemp)
BIC_ARMA
## [1] 4819.414

Now I manually compute the following evaluation metrics between prediction and test data: RMSE, MAE, \(R^2\) score.

forecast <- forecast_ts_train_meantemp |> forecast(h=114)
#forecast
predicted <- as.numeric(forecast$mean)
actual <- as.numeric(ts_test_meantemp)
RMSE_ARMA <- rmse(predicted, actual)
RMSE_ARMA
## [1] 4.298832
MAE_ARMA <- mae(predicted, actual)
MAE_ARMA
## [1] 3.575725
rsq <- function (x, y) cor(x, y) ^ 2

RSQ_ARMA <- rsq(actual, predicted)

RSQ_ARMA
## [1] 0.8174112

Two tables will be presented, one reports metrics of the model applied on training set, and the other reports metrics for evaluating predictions based on test set.

d <- cbind(AIC = AIC_ARMA, BIC = BIC_ARMA)
# at most 4 decimal places
knitr::kable(d, digits = 4)
AIC BIC
4789.418 4819.414
d <- cbind(R2 = RSQ_ARMA, RMSE = RMSE_ARMA, MAE = MAE_ARMA)
# at most 4 decimal places
knitr::kable(d, digits = 4)
R2 RMSE MAE
0.8174 4.2988 3.5757

7.1.2 Plot Forecast

  1. Original time series of meantemp
autoplot(forecast(forecast_ts_train_meantemp))# + autolayer(xts_test_meantemp)

length(ts_test_meantemp)
## [1] 114
forecast_ts_train_meantemp |> forecast(h=114) |>
autoplot() + autolayer(ts_test_meantemp)

  1. Original time series of meantemp but aggregated weekly
#autoplot(forecast(ts_week_train_meantemp)) #+ autolayer(ts_week_test_meantemp)
  1. Deseasonalized time series
#forecast_ts_train_adj + ts_train_adj
autoplot(forecast(forecast_ts_train_adj_meantemp))

  1. Deseasonalized and detrended time series
autoplot(forecast(forecast_ts_train_adj_det_meantemp))

Let us illustrate plot of forecasting the test data (using forecast from case 1) joint with test data.

#ts_plot(ts_test_meantemp, forecast$mean)
#ts.union(ts_test_meantemp, forecast$mean)
#forecast$mean

xts_temp <- xts(ts_test_meantemp, order.by=df_test$date, "%Y-%m-%d")
xts_temp_2 <- xts(forecast$mean, order.by=df_test$date, "%Y-%m-%d")
#xts_temp
#xts_temp_2
ts_plot(xts_temp, xts_temp_2)

Q4. I did all unit root tests and ARIMA models on all the following datasets: original time series, deseasonalized time series, and combination of deaseonalized and detrended time series. Judging by the plots, the adjusted versions performed poorly when fed to the model compared to feeding the weekly aggregated data that is fed to AUTOARIMA but with autmatic seasonality modelling of seasonality. Could you please share your viewpoint regarding this?

7.2 Vector autoregressive (VAR)

Now we model a multivariate time series by using both the columns meantemp and wind_speed.

We plot wind_speed time series first:

We use an interactive plot.

p2 <- df_train |>
  ggplot( aes(x=date, y=wind_speed)) +
    geom_area(fill="#69b3a2", alpha=0.5) +
    geom_line(color="#69b3a2") +
    ylab("bitcoin price ($)") +
    theme_ipsum()

# Turn it interactive with ggplotly
p2 <- ggplotly(p2)
#p
p2

Now we use a static plot.

#xts_train_meantemp <- xts(df_train$meantemp, order.by=df_train$date, "%Y-%m-%d")
#ts_train_meantemp <-ts_ts(xts_train_meantemp)

xts_train_windspeed <- xts(df_train$wind_speed, order.by=df_train$date, "%Y-%m-%d")
ts_train_windspeed <-ts_ts(xts_train_windspeed)
ts_plot(ts_train_windspeed)

We must deal with anomalies first.

xts_train_windspeed <- tsclean(xts_train_windspeed)
ts_plot(xts_train_windspeed)

Let us now do the same with test data of wind_speed column, as we need it later for evaluation:

xts_test_windspeed <- xts(df_test$wind_speed, order.by=df_test$date, "%Y-%m-%d")
head(xts_test_windspeed)
##                [,1]
## 2017-01-01 2.743478
## 2017-01-02 2.894444
## 2017-01-03 4.016667
## 2017-01-04 4.545000
## 2017-01-05 3.300000
## 2017-01-06 8.681818
tail(xts_test_windspeed)
##                [,1]
## 2017-04-19  9.02500
## 2017-04-20  5.56250
## 2017-04-21  6.96250
## 2017-04-22  8.89000
## 2017-04-23  9.96250
## 2017-04-24 12.15714
ts_plot(xts_test_windspeed)

We remove anomalies in the test data too:

xts_test_windspeed <- tsclean(xts_test_windspeed)
ts_test_windspeed <- ts_ts(xts_test_windspeed)
ts_plot(xts_test_windspeed)

In what follows, interactive plot of both time series are illustrated:

fig <- plot_ly(df_train, type = 'scatter', mode = 'lines')%>%
  add_trace(x = ~date, y = ~meantemp, name = 'MeanTemp')%>%
  add_trace(x = ~date, y = ~wind_speed, name = 'WindSpeed')%>%
  layout(title = 'custom tick labels',legend=list(title=list(text='variable')),
         xaxis = list(dtick = "M1", tickformat= "%b\n%Y"), width = 2000)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
options(warn = -1)
fig <- fig %>%
  layout(
         xaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff',  tickangle = 0),
         yaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff'),
         plot_bgcolor='#e5ecf6')


fig

We aggregate the windspeed time series by weekly

# Weekly mean temperature
xts_week_train_windspeed <- apply.weekly(xts_train_windspeed, sum)
ts_week_train_windspeed <- ts_ts(xts_week_train_windspeed)

xts_week_test_windspeed <- apply.weekly(xts_test_windspeed, sum)
ts_week_test_windspeed <- na.remove(ts_ts(xts_week_test_windspeed))
#ts_week_test_windspeed <- as.ts(xts_week_test_windspeed)
ts_week_test_meantemp
## Time Series:
## Start = 2017 
## End = 2017.30938349179 
## Frequency = 54.9479867256584 
##  [1]  15.91304 122.41073  92.34209 103.12740 120.67435 117.87830 109.63056
##  [8] 135.81840 139.83333 150.79487 140.80200 149.57842 188.10000 211.42460
## [15] 201.63632 208.74603 234.58056  32.00000
## attr(,"na.removed")
##  [1]   2   3   4   5   6   7   9  10  11  12  13  14  16  17  18  19  20  21  23
## [20]  24  25  26  27  28  30  31  32  33  34  35  37  38  39  40  41  42  44  45
## [39]  46  47  48  49  51  52  53  54  55  56  58  59  60  61  62  63  65  66  67
## [58]  68  69  70  72  73  74  75  76  77  79  80  81  82  83  84  86  87  88  89
## [77]  90  91  93  94  95  96  97  98 100 101 102 103 104 105 107 108 109 110 111
## [96] 112

Let us plot static plots for them individually as well:

ts_plot(ts_week_train_windspeed)

ts_plot(xts_train_windspeed)

I tried the original data on VAR model, but I think due to many fluctuations and seasonality components, it didn’t yield accurate results. I fed the weekly aggregated data instead, and I detected a significant imporve.

Now we create a union of both time series and store it.

#VAR_data <- ts.union(ts_train_meantemp, ts_train_windspeed)
VAR_data <- ts.union(ts_week_train_meantemp, ts_week_train_windspeed)
colnames(VAR_data) <- cbind("meantemp","wind_speed")
#v1 <- cbind(ts_week_train_meantemp, ts_week_train_windspeed)
#colnames(v1) <- cbind("meantemp","wind_speed")
#lagselect <- VARselect(v1, type = "both")
#lagselect$selection
VAR_data <- na.remove(VAR_data)
#tail(v1)

We look at different lags suggested by different criteria if we use VAR model.

lagselect <- VARselect(VAR_data, season=12, type = "both")
lagselect$selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##     10      6      1     10
lagselect$criteria
##                  1           2           3           4           5           6
## AIC(n)    10.86018    10.81179    10.74853    10.72197    10.70787    10.63106
## HQ(n)     11.06185    11.04034    11.00397    11.00430    11.01708    10.96716
## SC(n)     11.35841    11.37644    11.37961    11.41948    11.47181    11.46143
## FPE(n) 52091.96344 49644.23391 46616.68238 45413.78546 44800.44647 41513.24904
##                  7           8           9          10
## AIC(n)    10.61618    10.62058    10.63627    10.60996
## HQ(n)     10.97918    11.01047    11.05304    11.05362
## SC(n)     11.51298    11.58381    11.66592    11.70605
## FPE(n) 40929.31852 41143.70844 41833.75446 40791.90935

Now that we have merged the column meantemp with wind_speed, we use VAR models with lag to be 10.

VAR_est <- VAR(y = VAR_data, season=8, type="both", p=10)
VAR_est
## 
## VAR Estimation Results:
## ======================= 
## 
## Estimated coefficients for equation meantemp: 
## ============================================= 
## Call:
## meantemp = meantemp.l1 + wind_speed.l1 + meantemp.l2 + wind_speed.l2 + meantemp.l3 + wind_speed.l3 + meantemp.l4 + wind_speed.l4 + meantemp.l5 + wind_speed.l5 + meantemp.l6 + wind_speed.l6 + meantemp.l7 + wind_speed.l7 + meantemp.l8 + wind_speed.l8 + meantemp.l9 + wind_speed.l9 + meantemp.l10 + wind_speed.l10 + const + trend + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 
## 
##    meantemp.l1  wind_speed.l1    meantemp.l2  wind_speed.l2    meantemp.l3 
##    0.689509677   -0.036305048    0.122053939    0.130967121    0.049179176 
##  wind_speed.l3    meantemp.l4  wind_speed.l4    meantemp.l5  wind_speed.l5 
##    0.046573897   -0.002112818    0.110696976    0.238902095   -0.050100942 
##    meantemp.l6  wind_speed.l6    meantemp.l7  wind_speed.l7    meantemp.l8 
##   -0.146334371    0.144786326   -0.113654255   -0.002846842   -0.148801275 
##  wind_speed.l8    meantemp.l9  wind_speed.l9   meantemp.l10 wind_speed.l10 
##    0.149206407    0.074555275   -0.030718020   -0.049045011    0.219351729 
##          const          trend            sd1            sd2            sd3 
##   17.370795528    0.021320033    8.139625406    0.391065418    5.383036420 
##            sd4            sd5            sd6            sd7 
##    1.677946665   10.140458250    0.772150182    0.358336901 
## 
## 
## Estimated coefficients for equation wind_speed: 
## =============================================== 
## Call:
## wind_speed = meantemp.l1 + wind_speed.l1 + meantemp.l2 + wind_speed.l2 + meantemp.l3 + wind_speed.l3 + meantemp.l4 + wind_speed.l4 + meantemp.l5 + wind_speed.l5 + meantemp.l6 + wind_speed.l6 + meantemp.l7 + wind_speed.l7 + meantemp.l8 + wind_speed.l8 + meantemp.l9 + wind_speed.l9 + meantemp.l10 + wind_speed.l10 + const + trend + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 
## 
##    meantemp.l1  wind_speed.l1    meantemp.l2  wind_speed.l2    meantemp.l3 
##    0.069024744    0.184304838    0.131154305   -0.004551666   -0.013554779 
##  wind_speed.l3    meantemp.l4  wind_speed.l4    meantemp.l5  wind_speed.l5 
##    0.128031917   -0.130028419    0.052890178    0.169230094    0.163426050 
##    meantemp.l6  wind_speed.l6    meantemp.l7  wind_speed.l7    meantemp.l8 
##   -0.075300255    0.027147295   -0.251302983   -0.079158189    0.040539603 
##  wind_speed.l8    meantemp.l9  wind_speed.l9   meantemp.l10 wind_speed.l10 
##   -0.037406248    0.047408589    0.141711053   -0.037567017    0.127941422 
##          const          trend            sd1            sd2            sd3 
##   20.605673892    0.019403293   -7.616951554   -0.788899882   -7.446507564 
##            sd4            sd5            sd6            sd7 
##    1.948843755   -0.822851441   -6.381095605   -8.345748018
summary(VAR_est)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: meantemp, wind_speed 
## Deterministic variables: both 
## Sample size: 198 
## Log Likelihood: -1549.651 
## Roots of the characteristic polynomial:
## 0.9788 0.9788 0.9048 0.9048 0.877 0.877 0.8366 0.8366 0.8299 0.8299 0.8002 0.8002 0.7419 0.7419 0.6714 0.654 0.654 0.5891 0.5891 0.1924
## Call:
## VAR(y = VAR_data, p = 10, type = "both", season = 8L)
## 
## 
## Estimation results for equation meantemp: 
## ========================================= 
## meantemp = meantemp.l1 + wind_speed.l1 + meantemp.l2 + wind_speed.l2 + meantemp.l3 + wind_speed.l3 + meantemp.l4 + wind_speed.l4 + meantemp.l5 + wind_speed.l5 + meantemp.l6 + wind_speed.l6 + meantemp.l7 + wind_speed.l7 + meantemp.l8 + wind_speed.l8 + meantemp.l9 + wind_speed.l9 + meantemp.l10 + wind_speed.l10 + const + trend + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 
## 
##                 Estimate Std. Error t value Pr(>|t|)    
## meantemp.l1     0.689510   0.078189   8.818 1.38e-15 ***
## wind_speed.l1  -0.036305   0.067738  -0.536 0.592687    
## meantemp.l2     0.122054   0.093322   1.308 0.192691    
## wind_speed.l2   0.130967   0.066944   1.956 0.052068 .  
## meantemp.l3     0.049179   0.093036   0.529 0.597774    
## wind_speed.l3   0.046574   0.066730   0.698 0.486168    
## meantemp.l4    -0.002113   0.092811  -0.023 0.981865    
## wind_speed.l4   0.110697   0.066391   1.667 0.097299 .  
## meantemp.l5     0.238902   0.092815   2.574 0.010912 *  
## wind_speed.l5  -0.050101   0.066979  -0.748 0.455495    
## meantemp.l6    -0.146334   0.091951  -1.591 0.113380    
## wind_speed.l6   0.144786   0.066433   2.179 0.030683 *  
## meantemp.l7    -0.113654   0.091656  -1.240 0.216691    
## wind_speed.l7  -0.002847   0.066785  -0.043 0.966049    
## meantemp.l8    -0.148801   0.092809  -1.603 0.110736    
## wind_speed.l8   0.149206   0.066382   2.248 0.025890 *  
## meantemp.l9     0.074555   0.091103   0.818 0.414301    
## wind_speed.l9  -0.030718   0.066459  -0.462 0.644525    
## meantemp.l10   -0.049045   0.072711  -0.675 0.500904    
## wind_speed.l10  0.219352   0.067240   3.262 0.001337 ** 
## const          17.370796   4.941341   3.515 0.000564 ***
## trend           0.021320   0.016078   1.326 0.186623    
## sd1             8.139625   3.773365   2.157 0.032409 *  
## sd2             0.391065   3.684293   0.106 0.915594    
## sd3             5.383036   3.758591   1.432 0.153935    
## sd4             1.677947   3.559751   0.471 0.637987    
## sd5            10.140458   3.762532   2.695 0.007747 ** 
## sd6             0.772150   3.686816   0.209 0.834361    
## sd7             0.358337   3.714385   0.096 0.923259    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 12.26 on 169 degrees of freedom
## Multiple R-Squared: 0.9453,  Adjusted R-squared: 0.9362 
## F-statistic: 104.2 on 28 and 169 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation wind_speed: 
## =========================================== 
## wind_speed = meantemp.l1 + wind_speed.l1 + meantemp.l2 + wind_speed.l2 + meantemp.l3 + wind_speed.l3 + meantemp.l4 + wind_speed.l4 + meantemp.l5 + wind_speed.l5 + meantemp.l6 + wind_speed.l6 + meantemp.l7 + wind_speed.l7 + meantemp.l8 + wind_speed.l8 + meantemp.l9 + wind_speed.l9 + meantemp.l10 + wind_speed.l10 + const + trend + sd1 + sd2 + sd3 + sd4 + sd5 + sd6 + sd7 
## 
##                 Estimate Std. Error t value Pr(>|t|)    
## meantemp.l1     0.069025   0.092992   0.742 0.458956    
## wind_speed.l1   0.184305   0.080562   2.288 0.023391 *  
## meantemp.l2     0.131154   0.110990   1.182 0.238993    
## wind_speed.l2  -0.004552   0.079617  -0.057 0.954478    
## meantemp.l3    -0.013555   0.110649  -0.123 0.902647    
## wind_speed.l3   0.128032   0.079363   1.613 0.108556    
## meantemp.l4    -0.130028   0.110382  -1.178 0.240458    
## wind_speed.l4   0.052890   0.078960   0.670 0.503879    
## meantemp.l5     0.169230   0.110387   1.533 0.127130    
## wind_speed.l5   0.163426   0.079660   2.052 0.041756 *  
## meantemp.l6    -0.075300   0.109359  -0.689 0.492044    
## wind_speed.l6   0.027147   0.079010   0.344 0.731579    
## meantemp.l7    -0.251303   0.109008  -2.305 0.022362 *  
## wind_speed.l7  -0.079158   0.079429  -0.997 0.320386    
## meantemp.l8     0.040540   0.110380   0.367 0.713874    
## wind_speed.l8  -0.037406   0.078950  -0.474 0.636255    
## meantemp.l9     0.047409   0.108350   0.438 0.662271    
## wind_speed.l9   0.141711   0.079041   1.793 0.074780 .  
## meantemp.l10   -0.037567   0.086477  -0.434 0.664540    
## wind_speed.l10  0.127941   0.079970   1.600 0.111495    
## const          20.605674   5.876827   3.506 0.000582 ***
## trend           0.019403   0.019122   1.015 0.311698    
## sd1            -7.616952   4.487732  -1.697 0.091484 .  
## sd2            -0.788900   4.381796  -0.180 0.857337    
## sd3            -7.446508   4.470160  -1.666 0.097601 .  
## sd4             1.948844   4.233676   0.460 0.645879    
## sd5            -0.822851   4.474847  -0.184 0.854325    
## sd6            -6.381096   4.384797  -1.455 0.147448    
## sd7            -8.345748   4.417585  -1.889 0.060576 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 14.58 on 169 degrees of freedom
## Multiple R-Squared: 0.5074,  Adjusted R-squared: 0.4258 
## F-statistic: 6.217 on 28 and 169 DF,  p-value: 8.952e-15 
## 
## 
## 
## Covariance matrix of residuals:
##            meantemp wind_speed
## meantemp     150.37      49.25
## wind_speed    49.25     212.69
## 
## Correlation matrix of residuals:
##            meantemp wind_speed
## meantemp     1.0000     0.2754
## wind_speed   0.2754     1.0000
summary(VAR_est$varresult)
##            Length Class Mode
## meantemp   12     lm    list
## wind_speed 12     lm    list

7.2.1 Evaluate

Based on the model summary, we can look at the value of metrics obtained by the model when we use lag 10:

lagselect$criteria[,10]
##      AIC(n)       HQ(n)       SC(n)      FPE(n) 
##    10.60996    11.05362    11.70605 40791.90935

The \(R^2\) score of the model after applying can also be reported for both of the time series used:

VAR_meantemp_adjr <- summary(VAR_est$varresult$meantemp)$adj.r.squared
VAR_meantemp_adjr
## [1] 0.9361864
VAR_windspeed_adjr <- summary(VAR_est$varresult$wind_speed)$adj.r.squared
VAR_windspeed_adjr
## [1] 0.4258064

We test that the residuals are uncorrelated using a Portmanteau test.

serial.test(VAR_est, lags.pt=10, type="PT.asymptotic")
## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object VAR_est
## Chi-squared = 7.3153, df = 0, p-value < 2.2e-16
forecasts <- predict(VAR_est, h=114)
forecast <- VAR_est |> forecast(h=18)

Now we use test data to evaluate predictions:

predicted_meantemp <- as.numeric(forecast[2]$forecast$meantemp$mean)
actual_meantemp <- as.numeric(ts_week_test_meantemp)

predicted_windspeed <- as.numeric(forecast[2]$forecast$wind_speed$mean)
actual_winspeed <- as.numeric(ts_week_test_windspeed)
RMSE_meantemp_VAR <- rmse(predicted_meantemp, actual_meantemp)
RMSE_meantemp_VAR
## [1] 73.53212
RMSE_windspeed_VAR <- rmse(predicted_windspeed, actual_winspeed)
RMSE_windspeed_VAR
## [1] 19.90842
MAE_meantemp_VAR <- mae(predicted_meantemp, actual_meantemp)
MAE_meantemp_VAR
## [1] 57.97236
MAE_windspeed_VAR <- mae(predicted_windspeed, actual_winspeed)
MAE_windspeed_VAR
## [1] 14.47499
rsq <- function (x, y) cor(x, y) ^ 2
RSQ_meantemp_VAR <- rsq(predicted_meantemp, actual_meantemp)
RSQ_meantemp_VAR
## [1] 0.3756512
RSQ_windspeed_VAR <- rsq(predicted_windspeed, actual_winspeed)
RSQ_windspeed_VAR
## [1] 0.1374916

7.2.1.1 meantemp

d <- cbind(Adjusted_R2 = VAR_meantemp_adjr, AIC = lagselect$criteria[,10][0])
# at most 4 decimal places
knitr::kable(d, digits = 4)
Adjusted_R2
0.9362
d <- cbind(R2 = RSQ_meantemp_VAR, RMSE = RMSE_meantemp_VAR, MAE = MAE_meantemp_VAR)
# at most 4 decimal places
knitr::kable(d, digits = 4)
R2 RMSE MAE
0.3757 73.5321 57.9724

7.2.1.2 windpseed

d <- cbind(Adjusted_R2 = VAR_meantemp_adjr, AIC = lagselect$criteria[,10][0])
# at most 4 decimal places
knitr::kable(d, digits = 4)
Adjusted_R2
0.9362
d <- cbind(R2 = RSQ_windspeed_VAR, RMSE = RMSE_windspeed_VAR, MAE = MAE_windspeed_VAR)
# at most 4 decimal places
knitr::kable(d, digits = 4)
R2 RMSE MAE
0.1375 19.9084 14.475

7.2.2 Plot Forecast

First we plot forecasts based on the model being trained on the training data.

plot(forecasts)

forecast[2]$forecast$meantemp |> autoplot() + autolayer(ts_week_test_meantemp)

forecast[2]$forecast$wind_speed |>  autoplot() + autolayer(ts_week_test_windspeed)

Then, we plot the prediction of test data alongside the actual test data.

ts_plot(forecast[2]$forecast$meantemp$mean, ts_week_test_meantemp)

ts_plot(forecast[2]$forecast$wind_speed$mean, ts_week_test_windspeed)

7.2.3 Granger Causality

Granger_meantemp <- causality(VAR_est, cause = "meantemp")
Granger_meantemp
## $Granger
## 
##  Granger causality H0: meantemp do not Granger-cause wind_speed
## 
## data:  VAR object VAR_est
## F-Test = 3.0682, df1 = 10, df2 = 338, p-value = 0.0009546
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: meantemp and wind_speed
## 
## data:  VAR object VAR_est
## Chi-squared = 13.96, df = 1, p-value = 0.0001867
Granger_windspeed <- causality(VAR_est, cause = "wind_speed")
Granger_windspeed
## $Granger
## 
##  Granger causality H0: wind_speed do not Granger-cause meantemp
## 
## data:  VAR object VAR_est
## F-Test = 2.5126, df1 = 10, df2 = 338, p-value = 0.006333
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: wind_speed and meantemp
## 
## data:  VAR object VAR_est
## Chi-squared = 13.96, df = 1, p-value = 0.0001867

7.2.4 Forecast Error Variance Decomposition (FEVD)

FEVD1 <- fevd(VAR_est, n.ahead = 50)
FEVD1
## $meantemp
##        meantemp  wind_speed
##  [1,] 1.0000000 0.000000000
##  [2,] 0.9988206 0.001179378
##  [3,] 0.9921833 0.007816717
##  [4,] 0.9830339 0.016966062
##  [5,] 0.9614487 0.038551268
##  [6,] 0.9588092 0.041190775
##  [7,] 0.9377211 0.062278922
##  [8,] 0.9134910 0.086509021
##  [9,] 0.8625721 0.137427893
## [10,] 0.8254314 0.174568631
## [11,] 0.7681499 0.231850119
## [12,] 0.7173018 0.282698208
## [13,] 0.6756769 0.324323057
## [14,] 0.6306769 0.369323114
## [15,] 0.5968982 0.403101815
## [16,] 0.5643771 0.435622867
## [17,] 0.5380624 0.461937591
## [18,] 0.5178299 0.482170088
## [19,] 0.5033850 0.496614955
## [20,] 0.4933082 0.506691830
## [21,] 0.4867732 0.513226817
## [22,] 0.4843880 0.515611965
## [23,] 0.4855313 0.514468711
## [24,] 0.4899476 0.510052413
## [25,] 0.4961818 0.503818245
## [26,] 0.5030985 0.496901511
## [27,] 0.5102635 0.489736522
## [28,] 0.5170835 0.482916510
## [29,] 0.5228099 0.477190125
## [30,] 0.5271072 0.472892785
## [31,] 0.5294044 0.470595590
## [32,] 0.5297320 0.470268018
## [33,] 0.5281879 0.471812079
## [34,] 0.5249426 0.475057380
## [35,] 0.5204132 0.479586819
## [36,] 0.5149487 0.485051263
## [37,] 0.5088834 0.491116581
## [38,] 0.5025700 0.497429955
## [39,] 0.4964771 0.503522861
## [40,] 0.4909221 0.509077920
## [41,] 0.4861755 0.513824529
## [42,] 0.4823865 0.517613461
## [43,] 0.4796081 0.520391866
## [44,] 0.4779120 0.522088020
## [45,] 0.4772869 0.522713081
## [46,] 0.4776174 0.522382632
## [47,] 0.4787252 0.521274770
## [48,] 0.4804009 0.519599086
## [49,] 0.4824317 0.517568315
## [50,] 0.4846137 0.515386330
## 
## $wind_speed
##         meantemp wind_speed
##  [1,] 0.07585475  0.9241452
##  [2,] 0.08405805  0.9159420
##  [3,] 0.10823066  0.8917693
##  [4,] 0.12850993  0.8714901
##  [5,] 0.12843197  0.8715680
##  [6,] 0.16336265  0.8366374
##  [7,] 0.17888075  0.8211192
##  [8,] 0.18094166  0.8190583
##  [9,] 0.18051164  0.8194884
## [10,] 0.17419141  0.8258086
## [11,] 0.16695287  0.8330471
## [12,] 0.16518175  0.8348183
## [13,] 0.16507974  0.8349203
## [14,] 0.16645482  0.8335452
## [15,] 0.16453433  0.8354657
## [16,] 0.16225321  0.8377468
## [17,] 0.16407736  0.8359226
## [18,] 0.17090974  0.8290903
## [19,] 0.17604569  0.8239543
## [20,] 0.17960182  0.8203982
## [21,] 0.18265705  0.8173429
## [22,] 0.18690905  0.8130910
## [23,] 0.19438529  0.8056147
## [24,] 0.20045512  0.7995449
## [25,] 0.20333657  0.7966634
## [26,] 0.20576390  0.7942361
## [27,] 0.20819005  0.7918100
## [28,] 0.21018765  0.7898124
## [29,] 0.21158046  0.7884195
## [30,] 0.21171277  0.7882872
## [31,] 0.21107900  0.7889210
## [32,] 0.21025594  0.7897441
## [33,] 0.20925443  0.7907456
## [34,] 0.20816943  0.7918306
## [35,] 0.20728517  0.7927148
## [36,] 0.20647236  0.7935276
## [37,] 0.20573784  0.7942622
## [38,] 0.20529778  0.7947022
## [39,] 0.20534235  0.7946577
## [40,] 0.20599918  0.7940008
## [41,] 0.20698159  0.7930184
## [42,] 0.20802076  0.7919792
## [43,] 0.20925384  0.7907462
## [44,] 0.21078009  0.7892199
## [45,] 0.21247569  0.7875243
## [46,] 0.21413925  0.7858607
## [47,] 0.21558787  0.7844121
## [48,] 0.21681327  0.7831867
## [49,] 0.21789182  0.7821082
## [50,] 0.21877383  0.7812262
plot(FEVD1)

7.3 Neural Networks

7.3.1 Forecast

7.3.1.1 wind_speed

set.seed(34)
# nnetar() requires a numeric vector or time series object as
# input ?nnetar() can be seen for more info on the function
# nnetar() by default fits multiple neural net models and
# gives averaged results xreg option allows for only numeric
# vectors in nnetar() function
fit_windspeed = nnetar(ts_train_windspeed)
fit_windspeed
## Series: ts_train_windspeed 
## Model:  NNAR(1,1,2)[365] 
## Call:   nnetar(y = ts_train_windspeed)
## 
## Average of 20 networks, each of which is
## a 2-2-1 network with 9 weights
## options were - linear output units 
## 
## sigma^2 estimated as 14.78
forecast_windspeed <- forecast(fit_windspeed, h = 114, PI = T)
#forecast_windspeed

7.3.1.2 meantemp

fit_meantemp = nnetar(ts_train_meantemp)
fit_meantemp
## Series: ts_train_meantemp 
## Model:  NNAR(11,1,6)[365] 
## Call:   nnetar(y = ts_train_meantemp)
## 
## Average of 20 networks, each of which is
## a 12-6-1 network with 85 weights
## options were - linear output units 
## 
## sigma^2 estimated as 1.884
forecast_meantemp <- forecast(fit_meantemp, h = 114, PI = T)
#forecast_meantemp

7.3.2 Evluate

7.3.2.1 meantemp

predicted <- as.numeric(forecast_meantemp$mean)
actual <- as.numeric(ts_test_meantemp)
RMSE_meantemp_NN <- rmse(predicted, actual)
RMSE_meantemp_NN
## [1] 3.134642
MAE_meantemp_NN <- mae(predicted, actual)
MAE_meantemp_NN
## [1] 2.456404
rsq <- function (x, y) cor(x, y) ^ 2

RSQ_meantemp_NN <- rsq(actual, predicted)
RSQ_meantemp_NN
## [1] 0.7723737

7.3.2.2 wind_speed

predicted <- as.numeric(forecast_windspeed$mean)
actual <- as.numeric(ts_test_windspeed)
RMSE_windspeed_NN <- rmse(predicted, actual)
RMSE_windspeed_NN
## [1] 3.544034
MAE_windspeed_NN <- mae(predicted, actual)
MAE_windspeed_NN
## [1] 2.761422
RSQ_windspeed_NN <- rsq(actual, predicted)
RSQ_windspeed_NN
## [1] 0.05910477

Now present tables of report both of time series. #### meantemp

d <- cbind(R2 = RSQ_meantemp_NN, RMSE = RMSE_meantemp_NN, MAE = MAE_meantemp_NN)
# at most 4 decimal places
knitr::kable(d, digits = 4)
R2 RMSE MAE
0.7724 3.1346 2.4564

7.3.2.3 windpseed

d <- cbind(R2 = RSQ_windspeed_NN, RMSE = RMSE_windspeed_NN, MAE = MAE_windspeed_NN)
# at most 4 decimal places
knitr::kable(d, digits = 4)
R2 RMSE MAE
0.0591 3.544 2.7614

7.3.3 Forecast Plots

First we plot forecasts based on the model being trained on the training data.

forecast_windspeed |> autoplot() + autolayer(ts_test_windspeed)

forecast_meantemp |> autoplot() + autolayer(ts_test_meantemp)

Then, we plot the prediction of test data alongside the actual test data.

xts_temp <- xts(ts_test_meantemp, order.by=df_test$date, "%Y-%m-%d")
xts_temp_2 <- xts(forecast_meantemp$mean, order.by=df_test$date, "%Y-%m-%d")

ts_plot(xts_temp, xts_temp_2)

xts_temp <- xts(ts_test_windspeed, order.by=df_test$date, "%Y-%m-%d")
xts_temp_2 <- xts(forecast_windspeed$mean, order.by=df_test$date, "%Y-%m-%d")

ts_plot(xts_temp, xts_temp_2)

7.4 Results

metrics_list <- c("RMSE", "MAE", "RSQ")
ARMA_meantemp <- c(RMSE_ARMA, MAE_ARMA, RSQ_ARMA)

VAR_meantemp <- c(RMSE_meantemp_VAR, MAE_meantemp_VAR, RSQ_meantemp_VAR)
VAR_windspeed <- c(RMSE_windspeed_VAR, MAE_windspeed_VAR, RSQ_windspeed_VAR)

NN_meantemp <- c(RMSE_meantemp_NN, MAE_meantemp_NN, RSQ_meantemp_NN)
NN_windspeed <- c(RMSE_windspeed_NN, MAE_windspeed_NN, RSQ_windspeed_NN)


df_eval <- data.frame(metrics_list, ARMA_meantemp, VAR_meantemp, VAR_windspeed, NN_meantemp, NN_windspeed)

Note that for the SARMA and Nueral Network models, the original data is used, while weekly aggregated data is used for VAR model.

#table(df_eval) |> htmlTable
#knitr::kable(df_eval, col.names = names(df_eval))
#df_eval
htmlTable(df_eval,
          cgroup = c("Metrics","ARMA","VAR","NN"),
          n.cgroup = c(1,1,2,2),
          digits = 3
)
Metrics   ARMA   VAR   NN
metrics_list   ARMA_meantemp   VAR_meantemp VAR_windspeed   NN_meantemp NN_windspeed
1 RMSE   4.29883191290669   73.5321192126263 19.9084228665595   3.13464211657407 3.54403427490557
2 MAE   3.57572474235466   57.9723550576547 14.474990577398   2.4564038303364 2.76142222794668
3 RSQ   0.817411155319528   0.375651236191758 0.137491629553995   0.772373716756089 0.0591047717479771